Emotion-Aware Speaker Identification with Transfer Learning
نویسندگان
چکیده
Speech is a natural communication method used by humans. Speaker identification (SI) technology based on human speech has been as an entry point for many human-computer-interaction applications. The performance of SI models can degrade when dealing with expressive uttered in emotional situations because emotion databases do not have sufficient data to train various states. Generally, are trained using relatively more samples “neutral” than other classes. In this study, we propose emotion-aware (em-SI) that uses emotion-embedding vector generated from pre-trained recognition (SER) model along the acoustic features data. We assess individual English and Korean corpora confirm proposed provides improved multilingual corpora. evaluation results show accuracy em-SI Emotion Multimodal Database (KEMDy19) 3.2%, average speaker verification (SV) terms equal error rate (EER) was 1.3% compared baseline model. visualization embedding shows maps space where both information simultaneously represented. Through experiments conducted confirmed model, which learns integrating information, speech.
منابع مشابه
Feature Transfer Learning for Speech Emotion Recognition
Speech Emotion Recognition (SER) has achieved some substantial progress in the past few decades since the dawn of emotion and speech research. In many aspects, various research efforts have been made in an attempt to achieve human-like emotion recognition performance in real-life settings. However, with the availability of speech data obtained from different devices and varied acquisition condi...
متن کاملCost-Sensitive Learning for Emotion Robust Speaker Recognition
In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an u...
متن کاملIntegrating speaker identification and learning with adaptive speech recognition
Presently, speaker adaptive systems are the state-of-theart in automatic speech recognition. A general baseline model is adapted to the current speaker during recognition in order to improve the quality of the results obtained. However, the adaptation procedure needs to be able to distinguish between data from different speakers. Therefore, in a general speaker adaptive recognizer speaker recog...
متن کاملSpeaker Characteristics and Emotion Classification
In this paper, we address the — interrelated — problems of speaker characteristics (personalization) and suboptimal performance of emotion classification in state-of-the-art modules from two different points of view: first, we focus on a specific phenomenon (irregular phonation or laryngealization) and argue that its inherent multi-functionality and speaker-dependency makes its use as feature i...
متن کاملSpeaker Clustering in Emotion Recognition
Speaker variability is a known challenge for emotion recognition, however little work has been done on speaker similarity in terms of its contribution to the performance in the emotion classification task. In this paper, we investigate this topic, and find a clear link between speaker proximity and the recognition accuracy. Motivated by this result, emotion based speaker clustering is proposed ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2023
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2023.3297715